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EFFICIENT -EM ULATION DISPA TeH-BRSBB-QN 
" INOTRUGT TON^WIBTH— 



TECHNICAL FIELD 

This invention relates to digital signal processors, and 
more particularly to controlling multiple instructions received 
from the emulation instruction register. 



Digital signal processing is concerned with the 
representation of signals in digital form and the transformation 
or processing of such signal representation using numerical 
computation. Digital signal processing is a core technology for 
many of today's high technology products in fields such as 
wireless communications, networking, and multimedia'. One reason 
for the prevalence of digital signal processing technology has 
been the development of low cost, powerful digital signal 
processors (DSPs) that provide engineers the reliable computing 
capability to implement these products cheaply and efficiently. 
Since the development of the first DSPs, DSP architecture and 
design have evolved to the point where even sophisticated real- 
time processing of video-rate sequences can be performed. 

DSPs are often used for a variety of multimedia 
applications such as digital video, imaging, and audio. DSPs 
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can manipulate the digital signals to create and open such 

multimedia files. 

MPEG-1 (Motion Picture Expert Group), MPEG-2, MPEG-4 and 

H.263 are digital video compression standards and file formats. 
5 These standards achieve a high compression rate of the digital 

video signals by storing mostly changes from one video frame to 

another, instead of storing each entire frame. The video 

information may then be further compressed using a number of 

different techniques. 
40 The DSP may be used to perform various operations on the 

j video information during compression. These operations may 
= include motion search and spatial interpolation algorithms. The 
f! primary intention is to measure distortion between blocks within 

adjacent frames. These operations are computationally intensive 

4s and may require high data throughput. 

y 

t The MPEG family of standards is evolving to keep pace with 

* the increasing bandwidth requirements of multimedia applications 
and files. Each new version of the standard presents more 
sophisticated algorithms that place even greater processing 
20 requirements on the DSPs used in MPEG compliant video processing 
equipment . 

Video processing equipment manufacturers often rely on 
application-specific integrated circuits (ASICs) customized for 
video encoding under the MPEG and H.263 standards. However, 
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ASICs are complex to design, costly to produce and less flexible 
in their application than general-purpose DSPs. 



These and other features and advantages of the invention will 
become more apparent upon reading the following detailed 
description and upon reference to the accompanying drawings. 

Figure 1 is a block diagram of a mobile video device 
utilizing a processor according to one embodiment of the present 
invention . 

Figure 2 is a block diagram of a signal processing system 
according to an embodiment of the present invention. 

Figure 3 is a block diagram of an alternative signal 
processing system according to an embodiment of the present 
invention. 

Figure 4 illustrates exemplary pipeline stages of the 
processor in Figure 1 according to an embodiment of the present 
invention . 

Figure 5 is a block diagram of a emulation system according 
to one embodiment of the present invention. 

Figure 6 illustrates the process of receiving and executing 
multiple instructions from the emulation instruction register 
according to one embodiment of the present invention. 
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DETAILED DESCRIPTION 



Figure 1 illustrates a mobile video device 100 including a 
processor according to an embodiment of the invention. The 
mobile video device 100 may be a hand-held device which displays 
video images produced from an encoded video signal received from 
an antenna 105 or a digital video storage medium 120, e.g., a 
digital video disc (DVD) or a memory card. A processor 110 may 
communicate with a cache memory 115 which may store instructions 
and data for the processor operations. The processor 110 may be 
a microprocessor, a digital signal processor (DSP), a 
microprocessor controlling a slave DSP, or a processor with an 
hybrid microprocessor/DSP architecture. For the purposes of 
this application, the processor 110 will be referred to 
hereinafter as a DSP 110. 

The DSP 110 may perform various operations on the encoded 
video signal, including, for example, analog-to-digital 
conversion, demodulation, filtering, data recovery, and 
decoding. The DSP 110 may decode the compressed digital video 
signal according to one of various digital video compression 
standards such as the MPEG-family of standards and the H.263 
standard. The decoded video signal may then be input to a 
display driver 130 to produce the video image on a display 125. 

Hand-held devices generally have limited power supplies. 
Also, video decoding operations are computationally intensive. 
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Accordingly, a processor for use in such a device is 
advantageously a relatively high speed, low power device. 



■T he DOT 110 may have a de eply pipelined, load/storj 
architecture. By employing pipelining, the perf or^rfance of the 
DSP may be enhanced relative to a non-pipe lii'fed DSP. Instead of 
fetching a first instruction, executing^ he first instruction, 
and then fetching a second instruction, a pipelined DSP 110 
fetches the second instruction concurrently with execution of 
the first instruct iop^ thereby improving instruction throughput. 
Further, the cLerck cycle of a pipelined DSP may be shorter than 
that of axl^on -pipe lined DSP, in which the instruction are 

Such a DSP 110 may be used in video camcorders, 
teleconferencing, PC video cards, and High-Definition Television 
(HDTV) . In addition, the DSP 110 may also be used in connection 
with other technologies utilizing digital signal processing such 
as voice processing used in mobile telephony, speech 
recognition, and other applications. 

Turning now to Figure 2, a block diagram of a signal 
processing system 200 including DSP 110 according to an 
embodiment is shown. One or more analog signals may be provided 
by an external source, e.g., antenna 105, to a signal 
conditioner 202. Signal conditioner 202 is configured to perform 
certain preprocessing functions upon the analog signals. 
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Exemplary preprocessing functions may include mixing several of 
the analog signals together, filtering, amplifying, etc. An 
analog-to-digital converter (ADC) 204 is coupled to receive the 
preprocessed analog signals from signal conditioner 202 and to 
convert the preprocessed analog signals to digital signals 
consisting of samples, as described above. The samples are taken 
according to a sampling rate determined by the nature of the 
analog signals received by signal conditioner 202. The DSP 110 
is coupled to receive digital signals at the output of the ADC 
204. The DSP 110 performs the desired signal transformation upon 
the received digital signals, producing one or more output 
digital signals. A digital-to-analog converter (DAC) 206 is 
coupled to receive the output digital signals from the DSP 110. 
The DAC 206 converts the output digital signals into output 
analog signals. The output analog signals are then conveyed to 
another signal conditioner 208. The signal conditioner 208 
performs post-processing functions upon the output analog 
signals. Exemplary post-processing functions are similar to the 
preprocessing functions listed above. It is noted that various 
embodiments of the signal conditioners 202 and 208, the ADC 204, 
and the DAC 206 are well known. Any suitable embodiment of these 
devices may be coupled into a signal processing system 200 with 
the DSP 110. 




Attorney DocJ^^Jo. 10559-286001/P9293 

Turning next to Figure 3, a signal processing system 300 
according to another embodiment is shown. In this embodiment, a 
digital receiver 302 may receive one or more digital signals and 
to convey the received digital signals to the DSP 110. As with 
the embodiment shown in Figure 2, DSP 110 performs the desired 
signal transformation upon the received digital signals to 
produce one or more output digital signals. Coupled to receive 
the output digital signals is a digital signal transmitter 304. 
In one exemplary application, the signal processing system 300 
is a digital audio device in which the digital receiver 302 
conveys to the DSP 110 digital signals indicative of data stored 
on the digital storage device 120. The DSP 110 then processes 
the digital signals and conveys the resulting output digital 
signals to the digital transmitter 304. The digital transmitter 
304 then causes values of the output digital signals to be 
transmitted to the display driver 130 to produce a video image 
on the display 125. 

— The pipuliue illustrated in Figure 4 — inoludco eight otagoc,' 
which may include instruction fetch 402-403, decode 404^*a"ckiress 
calculation 405, execution 406-408, and wrij^aH5ack 409 stages. 
An instruction i may be f etch^c^jj?K^ne^cloclc cycle and then 
operated on and execu^ecT in the pipeline in subsequent clock 
cycles coj}c«Tfrently with the fetching of new instructions, e.g., 
and 
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■Hrpei-i- ning may introduce addiLiuudl cuuiUindLlon problems 



and hazards to processor performance. Jumps in the program flow 
may create empty slots, or "bubbles," in the pipeline. 
Situations which cause a conditional brancji^to be taken or an 
exception or interrupt to be generate may alter the sequential 
flow of instructions. After atrch an occurrence, an new 
instruction may be fetcj>ea outside of the sequential program 
flow, making the ^pemaining instructions in the pipeline 
irrelevant .^Methods such as data forwarding, branch prediction, 

y 

and ^associating valid bits with instruction addresses in the 
^pipolinc may bo employed to deal with thoso compl e xities^ ^ 

Figure 5 is a block diagram illustrating an emulation system 
500 according to one embodiment of the present invention. The 
emulation system 500 includes the connection of an in-circuit- 
emulator (ICE) 502 to the DSP 110 through a JTAG {Joint Test 
Action Group) interface 504. In-circuit-emulation is a system 
which includes a peripheral device referred to as an in-circuit- 
emulator (ICE) 502 that is external to a target processor system 
which monitors the target processor f s operations and can 
generate real-time trace information for reconstructing 
processor execution in an external host emulator. The ICE 502 
may control the processor and monitor' and modify the state of 
the registers within the processor. An ICE 502 may include its 
own ICE bus, separate from normal data, address or control 



Attorney DocJ^^Io. 10559-286001/P9293 

busses found on the processor integrated circuit, so as not to 
interfere with processor behavior while the ICE 502 generates 
trace information. 

Emulation may be performed during procedures such as 
5 debugging, hardware development, or software development using a 
JTAG interface 504 as defined by the standard specified by IEEE 
1149.1. Instructions that are to be executed during emulation 
may be scanned in from the ICE 502 to the emulation instruction 
register (EMUIR) 505 using the JTAG interface 504. The 
^io instructions may be scanned serially from the ICE 502 to the 
;=j JTAG interface 504 through a shift register (not shown) . After 
= the shift register is loaded from the ICE 502, the JTAG 

interface 504 loads either of the instruction registers 515, 520 

"■4 

in the EMUIR 505 in parallel. For example, a first 64-bit 

,j. 

instruction may be loaded from the ICE 502 to the first 

,U 

: ,A instruction register 515 and a second 64-bit instruction may be 

;3 

•5 loaded from the ICE 502 to the second instruction register 520. 
Of course, each of the 64-bit instructions may include a single 
instruction, or a plurality of instructions. For example, the 
20 64-bit instructions may include a 32-bit instruction and 2 

parallel 16-bit instructions. 
a^5 The IlisL 64-biL instruction may be loaded aerj.aJJ4*--*frUj 

the first instruct ionj^gj^t«~""5T5~through the JTAG interface 

in 64 clock cycl e s and tho cooond 64 - bit inotruction may bo 
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After the instructions are loaded into the instruction 
registers 515, 520, the JTAG system may enter a run-test idle 
(RTI) state indicating that the instructions may be issued to 
the pipeline. After entering the RTI state, the first 
instruction may be issued to the pipeline. When the first 
instruction reaches the write-back stage, the second instruction 
may be issued to the pipeline. After the second instruction 
reaches write-back, the JTAG interface 504 waits for the next 
instruction. If the ICE 502 wants to repeat the first 
instruction and/or the second instruction, the instructions do 
not need to be reloaded into the instruction registers 515, 520. 
When the first or second instructions are repeated, the clock 
cycles necessary to load the instructions into the instruction 
registers 515, 520 through the JTAG interface 504 are saved. 

The RTI state allows certain operations to occur depending 
on the current instruction. Entering the RTI state consumes a 
clock cycle, and thus, slows down the emulation of the DSP 110. 
By allowing the emulation instruction register 505 to provide 
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multiple instructions, the DSP 110 may not need an RTI after 
every instruction is executed, thus saving time. 
hjJky Thft amnlaHnn qyqfpm S00 ^rmrHing t - p one embodiment of tl 

7 present invention also includes emulation control logic 522/ a 

s state machine 523, a multiplexer 525, a register 527, ancK a 

decoder 530. The emulation control logic 522 includes^ the state 
machine 523 and provides control signals to the instruction 
registers 515, 520, the multiplexers 525, and the register 527. 
The control signals from the emulation control logic controls 
= io the updates and reading of the EMUIR 505. /in one embodiment, 
the emulation instruction register is a/128-bit instruction 
register 510, which includes a plurality of smaller instruction 
fpj registers such as the 64-bit firs/C and second instruction 
Hl registers 515, 520. Typicallv/ the instruction registers 515, 

ills 520 may supply one instruction at a time, with the instruction 
M being up to 64-bits in Length. However, according to one 
^ embodiment of the present invention, multiple instructions may 

be supplied simultaneously from the 64-bit instruction registers 
515, 520. As sKown in Figure 5, the first instruction 515 and 
20 the second instruction 520 may be loaded in the 64-bit 
instruction register. Of course, the size of the first 
instruction 515 and the second instruction 520 must not exceed 
64-bits. Thus, the first instruction 515 may be a 32-bit 
vhstrnrtinn an d fhp .qpmnri i n^t- nirt i nn S90 may ho a ^-h-it- 
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instruction . Thn firgf -anH ^rnnH inQ<-nirtinn^ 515, 520 may 

also be 16-bit or other size, provided_JJae-~sTze of the 
instructions fit ^jjifee^ach of the 64-bit instruction register 




at inn inqtrnrtinn rpg i gtgj:_50j>^3rovides the 



contents of the instruction registers 515, 520 to tl 
multiplexer 525. Because the instruction regi§#§rs 515, 520 may 
contain a plurality of instructions, the simulation control logic 
522 may control the flow of the instructions received from the 
emulation instruction register 5^f5 . The emulation control logic 
522 includes logic described! below to supply the instructions to 
the decoder 530. The/state machine 523 may determine whether 
the instructions/are valid. The state machine 523 may then 
provide the^se instructions to the decoder 530 via the register. 
This may provide the instructions to the decoder 530 while 
Tr-ing thP disruption to the Hpcoder 530r ^f 



The present invention is described using two 64-bit 
instruction registers providing two instructions of 64-bits or 
smaller. Of course, the invention may be accomplished on any 
size instruction register (N-bit) providing multiple 
instructions . 

The process 600 for processing instructions by the 
emulation control logic 522 is shown in Figure 6. The process 
600 begins at a start block 605. Proceeding to block 610, the 
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process 600 waits for an RTI to begin the flow of instructions 
The RTI may come from the JTAG interface 504 . Proceeding to 
block 615, the process determines whether an RTI is detected. 
If no RTI is detected, the process 600 proceeds along the NO 
5 branch back to block 610 to wait for the RTI. The process 600 
remains in this loop until an RTI is detected. 

Returning to block 615, once an RTI is detected, the 
process proceeds along the YES branch to block 620. In block 
620, the validity of the first instruction is determined. An 
;: j.o instruction may include a corresponding set of width bits 



defining the validity and size of the instruction. In one 
"~ embodiment of the invention, the width bits are a 2-bit signal. 
m With a 2-bit signal, there are 4 possible values for the 2-bit 
= ; width signal. For example, width bits of 00 indicates the 

Qs instruction is invalid, width bits of 01 indicates a 16-bit 

l.ij 

\~ instruction, width bits of 10 indicates a 32-bit instruction, 

□ 

u and width bits of 11 indicates a 64-bit instruction. By reading 
the width bits, the DSP 110 may determine both the validity and 
size of the instruction. 
20 If the instruction is valid, the process 600 proceeds along 

the YES branch to block 625. In block 625, the first 
instruction flows down the pipeline for execution. Following 
execution of the first instruction, the process 600 proceeds to 
block 630. Returning to block 620, if the instruction is 
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invalid, the process 600 proceeds along the NO branch to block 
630. 

In block 630, the second instruction is received by the DSP 
110. Because the first and second instructions are stored in 
the emulation instruction . register at the same time, the second 
instruction may be retrieved without having to enter another RTI 
state . 

Proceeding to block 630, the validity of the second 
instruction is determined. The validity of the second 
instruction may also be determined by examination of the width 
bits as described above. If the instruction is valid, the 
process 600 proceeds along the YES branch to block 635. In 
block 635, the second instruction flows down the pipeline for 
execution. Following execution of the second instruction, the 
process 600 proceeds to block 640. Returning to block 630, if 
the instruction is invalid, the process 600 proceeds along the 
NO branch to block 640. 

In block 640, the process 600 determines whether the DSP 
110 should exit the emulation mode. The determination to exit 
the emulation mode may be provided by the emulation control 
logic 522. If further emulation is indicated, the process 
proceeds along the NO branch back to block 610 to wait for the 
next RTI. Returning to block 640, if the emulation control 
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logic 522 provides instructions to exit the emulation mode, the 
process 600 proceeds along the YES branch to an end block 645. 

Numerous variations and modifications of the invention will 
become readily apparent to those skilled in the art. 
Accordingly, the invention may be embodied in other specific 
forms without departing from its spirit or essential 
characteristics . 
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